Data / Others
Print
Tether AI Upgrades QVAC SDK, Bringing TurboQuant to Everyday Devices, Giving Local AI Data Center-Sized Memory

Tether’s open-source TurboQuant release compresses the memory AI needs during long sessions, letting laptops, phones, edge devices, and decentralized networks handle larger documents, longer conversations, codebases, and personal AI assistants without sending everything to the cloud


1 June 2026 – Tether’s AI Research Group today announced the production release of its open source implementation of TurboQuant, the Google Research memory compression algorithm that drew comparisons to “Pied Piper” from Silicon Valley for its ability to dramatically reduce the memory large AI models need to run. With TurboQuant, Google made a breakthrough in research. Tether is bringing it to life in production with its open-source local/edge AI engine QVAC Fabric, started as a llama.cpp, now Fabric incorporates several breakthroughs that push the boundaries of local on-device intelligence.

The release turns TurboQuant from a paper into open source software that developers can use, test, and adapt across laptops, consumer GPUs, mobile chips, edge devices, and decentralized inference networks. It includes a full quantization pipeline, adapters for common inference frameworks, developer documentation, and workload-tuned profiles designed for real deployment outside hyperscale data centers. The change matters because memory is one of the biggest reasons useful AI tasks still get pushed to the cloud. 

When someone uses an AI assistant, the model not only needs memory to load but it also needs working memory to remember the conversation, document, codebase, or instructions it has already seen. That working memory is called the KV cache, and it grows as the session gets longer. A short prompt may be easy to handle. A full contract, financial filing, research report, book, code repository, or several hours of conversation can push memory requirements beyond what most laptops, phones, and consumer GPUs can support.


At roughly 262,000 tokens, the scale of several hours of conversation or a few hundred pages of text, the KV cache for a 4B model can use about 8 GB of memory on its own. Four sessions at that size can push the cache alone to around 32 GB before accounting for the memory needed to load the model itself. That is why many AI experiences still rely on remote data centers, even when users would prefer to keep their work local.

TurboQuant changes that equation by compressing the KV cache up to 5x while maintaining output quality close to an uncompressed model. In practical terms, this means local AI can handle longer conversations, larger files, more context, and heavier workloads on the hardware people already own.

For users, this can mean asking an AI assistant on a laptop to read and analyze a hundred-page legal document without uploading the full file to a cloud provider. It can mean a student using an on-device tutor that retains an entire study session rather than losing context after a few messages. It can mean a developer running a local coding assistant that understands more of a codebase at once. It can mean a journalist, doctor, researcher, or small business owner using AI on sensitive files while keeping more of that work on the device.

For developers and startups, it means larger AI products can be built without assuming access to expensive GPU clusters. Instead of designing around short context windows, strict memory limits, or cloud-only deployment, teams can use TurboQuant to support longer sessions, larger workloads, and more flexible deployment across consumer hardware, edge devices, and peer-to-peer networks.

“Google’s research showed that AI memory could be compressed far more efficiently than most people assumed. Our work brings that breakthrough into production software that developers, startups, and users can actually build with,” said Paolo Ardoino, CEO of Tether. “If long context AI only works inside the largest data centers, then AI will be shaped by whoever owns the most hardware. TurboQuant changes what local AI can do by making memory less of a wall.”

“People should be able to ask an AI assistant to read a long document, remember a project, help with code, or work through private information without every task being forced through a remote data center,” he added. “This is what bringing TurboQuant to production makes possible. It gives local AI more memory, more context, and more room to become useful in everyday life.”

Tether’s implementation is designed for environments where production AI often runs into limits: constrained device memory, mixed hardware, long sessions, latency pressure, and deployment outside centralized cloud infrastructure. Rather than requiring teams to rebuild the research themselves, the open-source release provides the AI developer community with a shared foundation for testing, improving, and adapting TurboQuant across different systems.

TurboQuant will be included in QVAC SDK 0.12.0, making it available directly through Fabric, one of the core building blocks in that stack. QVAC SDK is the recommended integration path for developers building within Tether’s AI ecosystem. At the same time, the SDK brings together the full set of QVAC tools, libraries, and runtime components needed to build local AI applications across devices and environments.

The release also advances Tether’s broader AI strategy. The company is building toward AI that can operate closer to users, across personal devices, local networks, and decentralized infrastructure, rather than relying solely on centralized APIs and hyperscale data centers. Large compute will remain important, but Tether believes the next phase of AI will also be defined by software efficiency, portability, and the ability to run capable models where people actually use them.

latest news

Tether Updates Users on Strategic Changes to its Product Support Offering

Tether AbT* today announced that it will begin the planned wind-down of Alloy by Tether and aUSD₮, following a review of user activity, market demand, and the company’s broader priorities. Alloy by Tether was launched as an open platform designed to explore the creation of digital assets backed by Tether Gold tokens (“XAU₮”), including aUSD₮, […]

Learn more
Tether Signs MoU with Dubai Multi Commodities Centre to Advance Blockchain Education, Tokenization and Innovation in Dubai

16 June 2026 – Tether, the largest company in the digital asset industry, has signed a Memorandum of Understanding (MoU) with the Dubai Multi Commodities Centre (DMCC), an international business district that drives the flow of global trade through Dubai. The MoU aims to explore Tether’s potential to support DMCC in areas such as tokenization, […]

Learn more
Tether to Lead NEURA Robotics’ Series C Financing, One of the Largest (up to $1.4bn) Robotics & Physical AI Investment Rounds on Record, to Power the Financial and Intelligence Layer of the Robotics Era

10 June 2026 – Tether Investments announced today its role as the lead investor in one of the largest private investment rounds in humanoid robotics. By supporting the raise of up to $1.4bn from a diversified group of strategic and financial investors into NEURA Robotics, the group takes a decisive step by backing a company […]

Learn more
Read all news